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, Abstract 

[ We provide analytical functions approximating J dx, the basis of which is the 

D ■ kink soliton and which are both accurate (error < 0.2%) and simple. We demonstrate 

our results with some applications, particularly to the generation of Gaussian random 
^T) • fields. 

(N : 
> ■ 

O ■ 1 Introduction 

(N : 
T— I . 

I There is an inherent asymmetry between integration and differentiation which makes inte- 

\^ ' gration somewhat of an art form, and which is perhaps best exemplified by the lack of an 

■ elementary indefinite integral of the celebrated Gaussian: 

a : /exp (^^^^) dx (1) 

5^ ' The fact that such an integral does not in fact exist follows from the work of Laplace 
O ■ However, the Gaussian integral is fundamental, finding applications in statistics, error 

, theory and many branches of physics. In fact, anywhere one has Gaussian distributions, 

. ^ I cummulatives of these distributions will involve the above integral. Only special case definite 

^ ' integrals of are known, the most famous being: 



oo 



-^'/'^'dx = ^ (2) 
In addition there is the series expansion S: 







^tt(A:-l)!(2A;-l) 

Now in practise one can evaluate the integral accurately by numerical methods or tables, but 
in many cases it would be preferable to have an analytical solution, even if it were not exact, 
as long as the maximum error were very small and the approximation were simple 0. 

It turns out that there exists a function well known in the analysis of nonlinear partial 
differential equations whose derivative is very close to Gaussian - the kink soliton: 

cl){x) = ^tanh(6x - cf3) (4) 
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^Several rational function approximations exist but they are rather complicated Q. 
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with derivative: 

X(x) = ^6(l-tanh2(6x-c/3)) (5) 

where A, b, c, and f3 are all real constants. The graphs of and x{x) shown in figure (1). 

The kink soliton is the positive, time- independent, topological solution to the non- linear 

1 + 1 dimensional partial differential equation: 

where a subscript denotes partial derivative with respect to that variable. The solution to this 
equation is topological because the boundary conditions at x = ±00 are different. Leaving 
the physical origin of (p behind, it is interesting to examine the series expansion of tanh(a;): 

00 oSfcfO^'' - I) TT 

tanh(x) = J2 /oMi B^kX^''-^ , valid for x < - (7) 
k=i ^^'^>- ^ 

which should be compared with eq. @ for / e~^^dx. Here Bk are the Bernoulli numbers with 
generating function t/(e* — 1). We see that although the coefficients differ in each case, the 
powers of x in the expansions are identical. Further both x{x) and have the property 
that their derivatives can be re-expressed in terms of themselves and </)(x) or powers of x 
respectively. These observations shed some light on the foundations of the approximation. 
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Figure 1: Plot of (solid line), x(x) (dotted line) and tanh(x) (dashed line) which is the 
kink soliton. 

^One can consider a one-parameter family of approximations to the Gaussian given by replacing x ^ x'^ 
in eq. (|^) which give better fits when e 7^ 1, but which do not have indefinite integrals as far as is known to 
the author. 
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2 Details of the approximation 



Turning to practical issues, we are left with choosing the constants, A, b, c to optimise the 
approximation of eq. (|l]). We need three constraints to fix the three parameters. First we 
require that the Gaussian and x(^) have the same symmetry axis. This requires the argument 
of tanh to vanish at = P which immediately implies from eq. (P) that c = b. 

At this stage we have a choice, dependent on whether we are interested in an approximate 
solution for small or large x. For large x, a constraint is obviously that our new approxima- 
tion, (j){x), must give exactly the same result as eq. (|2|) when differenced at infinity and the 
origin. This will ensure convergence of our approximation. Since tanh(x) ^ 1 as x — > oo, 
and tanh(O) = 0, this implies from eq. that: 




Finally we can impose that x(.x) = e-(^-^)V'^' at some point, i.e, we match the deriva- 
tives. We will choose x = (3 as the simplest. This gives: 

Ab = l^b= 

VVTfJ 

In fact the two are equal at another point as can be seen from figure (1). Our analytical 
approximation, which is very accurate for large x, is therefore: 

= tanh ( -^{x -(3)) ~ / e-("-^)'/'^'dx (8) 

where in this paper ~ is understood as meaning asymptotic convergence, as x — > oo and 
bounded error Vx. From figures (1,2) we see that the kink derivative underestimates the 
Gaussian at small (x — and overestimates it at large (x — 

Alternatively if one is interested in Jq e~^'^ I"'^ dx where u < 4cr say, then this will not be 
good enough, since the error in our approximation is strongly confined to small x. Instead 
we can impose that ^{x) must give the exact result, not at infinity, but at the end of the 
interval, i.e. at u. Thus we impose: 

Atanh(6(u - = e-(^-^)'/^'dx (9) 

In addition we need to match the derivatives x^^*) = e~*^^*~'^)^/'^^) at some point x* as 
before, and then solve the equations for A, b. It is an open question which matching point 
yields the best results. For illustrative purposes we choose x = /3 and again find A = 1/b, 
so that substituting in eq.(^) gives us a nonlinear root-finding problem for A. The right- 
hand side can be found for example, from tables of the error function, erf(x). This yields an 
approximation which is exact at x = -u and hence a much better approximation for small x, 
but which is invalid for x ^ u. The extension to cases with variable lower limit of integration 
is obvious and will not be considered. 
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Figure 2: Plot of the difference between and x(^)- 




1 2 3 4 5 

Figure 3: Plot of the error function, and the soliton approximant, <i){x). The maximum 
difference occurs at a; = 1.12 and is 3.91%. The error drops below 1% for x > 2.3 and 
converges exponentially to zero. 
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One might be tempted to generahse eq. (^) to a one-parameter family of approximations 
to the error function: 

Ap{x) = AtanhP{bx) (10) 

which have derivative: 

A'p{x) = AbptanhP-\bx)sech'^{bx) (11) 

However, since for p ^ 1, Ap(0) = 0, they are not really suitable as approximations to 
a Gaussian. Rather they are skewed distributions with maxima at x > 0. It turns out 
however, that they will be useful later. 

For testing our approximation we will use the <j){x) valid for large x, denoted ^(x)/,, given 
by eq. (^). The crucial question is of course, how good is this approximation ? It turns out 
that it is very good in most cases, as can be seen from figures (3) and (4). The maximum 
error from using (P{x)l is 3.91% at x = 1.12. However as discussed earlier, if one is interested 
in the result for small x, and xi is small, then this is not the best approximation to use. 
In practise, the error drops off' very quickly due to the exponential nature of tanh(x). For 
example, the error in estimating erf(x) drops below 1% for x > 2.3 and at x = 5 the error is 
2.51 X 10~^. The error as a function of x is plotted in figure (4). 



3 Improving the approximation 

The shape of figure (4) is, in fact, rather startling because it is a very simple shape. From 
the graph it has a single local maximum and hence two points where the concavity changes. 
Hence although it cannot be written down explicitely in terms of elementary functions 0, it 
can be approximated very closely. Several fitting shapes were tried, such as the log-normal 
and Poisson distributions, but the best was found to be a generalised Maxwell-distribution: 

£;(x) = aix"exp(- — ) (12) 

For the case used in the figures, that of erf(x), the best parameters for reducing the maximum 
error (i.e. minimising w.r.t. the sup-norm || • ||oo) were (see figure (5)): 

ai = 0.062 , n = 2.27 , aa = 1.43 (13) 

which reduced the maximum error to 0.15%. It is also likely that our choice of function and 
parameters for E{x) is not optimal, since formal optimisation was not used, but was based 
rather on a numerical investigation of the parameter space {ai,a2,n}. 

Further, since the required E{x) is a skewed Gaussian with maximum at non-zero x we 
can profitably employ the functions given by eq. (pT|), originally introduced to model the 
Gaussian, as fits for the error. In this case our approximation becomes: 



2 

tanh(^^x) -|- (03 tanh^(x))' 
vr 



(14) 



where ' denotes derivative w.r.t. x. For 03 = 0.23 and p = 9.7 the error is at most 9 x 10 ^. 
By suitable generalisation of the second term it is possible to increase the accuracy to the level 



5 



0.04 



0.035 



0.03 



0.025 



0.02 



0.015 



0.01 



0.005 




Figure 4: The difference of erf(x) and (f){x)L- Tliis is closely approximated by log-normal 
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distributions or generalised Maxwellians of the form aix^e"^ 
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Figure 5: The final error after modeling figure (4) by the generalised Maxwell distribution of 
eg. (|l2|) ■ The maximum error is about 0.15%. 
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of the generalised Maxwell distribution, but for simplicity and because of its suggestiveness, 
we leave it in the above form. 

In the case of the error function we have explicitely that (/? = 0): 

2 

erf(2;) ~ tanh(^^x) + E{x) (15) 



where erf(a;) = <I>(x) = /q e ^^du is the error function. Similarly the complementary 

error function is given by: erfc(x) = 1 — tanh(;^a;) — E{x). 

4 Moments of the soliton 



A fundamental feature of a Gaussian distributed random variable is that all moments above 
the second, such as the skewness, are zero. From this it follows that the sum of error 
distributed random variables is itself error distributed. A natural question to ask is how well 
the soliton approximation preserves this feature. 

To make this more precise: given the distribution P{x), we may define the partition 
function ^( J) via: 

Z{ J) = J P(x)e-^^dx (16) 

From the "free energy" F{J) = InZ(J) we may now define the n-th moment, M„, of P{x) 
as: 

Mn = ^F(J)|,=o (17) 

Thus in the case of a Gaussian distribution with zero mean, it is easy to show that the 
free energy is a quadratic function of J. Hence the only non-zero moment is the second, i.e. 
the variance, as claimed above. In the case of the soliton approximant we have: 

Z{J) = y [1 - tanh'^{2x/^/^a)]e■^''dx (18) 

which is unfortunately not known analytically, so we resort to numerical analysis. Using the 
Gaussian case as a testbed we approximated the free energy with an 8-th degree polynomial: 

F(J) ~ ^Lo^n^ (19) 

For a Gaussian a„ = 0, n > 3. Using a least-squares method, the error, i.e. the largest 
a„ coefficient which is zero in the exact case but non-zero in the fit, was = 2.535 x 10~^. 
Each subsequent coefficient was roughly an order of magnitude smaller than the preceding 
one. 

In the case of the soliton approximation, given by eq. (^), the error was 2.309 x 10^^ 
again for the cubic term, and again with roughly a„+i ~ q„/10. 



''We use the notation Z{J) because of its ubiquitous use in statistical physics. In the case where a; is a 
function, Z{J) becomes a path-integral and derivative becomes functional derivative in eq. (p^. 
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"3 


0:4 


"5 


-2.535 X 10-« 
-2.309 X 10~3 


4.202 X 10-y 
4.308 X 10-^ 


-4.002 X 10"^'^ 
-4.389 X 10-5 






as 


2.182 X 10"^^ 
2.586 X 10-6 


-6.322 X 10"^^ 
-8.144 X 10^8 


7.532 X 10"^^ 
1.071 X 10-9 



Table (1) shows a comparison between the coefficients of the free-energy polynomials for 
the terms higher than cubic for the exact Gaussian and the soliton approximant xi^)- An 
interesting thing to note is that, although the accuracy is at the level one might expect, i.e. 
~ 10~^, the pattern of the terms is identical; namely both the signs and the decrease in the 
coefficients have the same behaviour in both cases. This suggests that numerical errors will 
be "coherent", i.e. the errors one has from numerical integration of the Gaussian will be 
of the same nature as those one obtains from the soliton approximation. This is perhaps 
obvious given the similarity of their power series (see eq.s (§,|7|)) but will not be true for other 
approximants in terms of e.g. rational functions 

We leave this discussion by noting that inclusion of E{x), via e.g. eq. ([l2|) , in the 
calculation of moments will reduce the above errors considerably, presumably by a factor of 
at least 10^. 



5 Applications 

Let us now consider a small sample of applications. A primary example is in the theory 
of statistics. If we have a uniformly distributed random variable x ^-nd we desire a random 
variable y with statistics given by a distribution /, first define the integral F{x) = Jq f{x)dx- 
Then y = F~^{x) will have the same distribution as /, where denotes the inverse of F, 
on the interval [^-^(O), ^-^(x)]. 

In particular if, as is often the case, we want to generate a realisation of a Gaussian 
random distribution, / = exp(— x^/a^), then with our approximation, F{x) = (p{x) (we have 
dropped the error correction term E{x) for simplicity) and the inverse (j)~^{x), gives us our 
random variable. In this case if y = (pix), then: 



= ^tanh"i(^x) (20) 
2 V vrcr 

which has the same form as <p{x) with the replacement tanh — > tanh~^ so that both the 
integral and inverse are essentially trivial. This avoids the necessity of using traditional 
Monte Carlo methods to calculate Gaussian distributions. 

A related problem occurs in the study of structure formation from gravitational collapse 
from Gaussian initial conditions, a standard assumption. The Press-Schecter formalism 
gives the cummulative mass function /(> M), which is the number of objects (such as 
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galaxies) with mass greater than M: 



/(> M) = 1 - erfc 



V2a{M, 



(21) 



where 6c, z £ R and a is the variance of the distribution. This can be estimated immediately 
using eq. (0^). 



One place where error functions are ubiquitous is in diffusion theory, since the decaying 
Gaussian is a solution to the standard diffusion equation. In the case where there is an 
extended distribution of diffusing material, situated at x < for example, the solution is 
instead given by: 

C{x,t) = ^eTic{-=\ (22) 



—erfc ( — ; — ^ 
2 \2VDiJ 

where D is the diffusion constant. Indeed the error function appears any time there is a sum- 
mation of the effects of a series of line sources each of which has an exponential distribution, 
both in finite and infinite media, as discussed in great detail in Q. 

Further, the error function can be related to special values of the degenerate hypergeo- 
metric function, iFi{a;j;z). In particular: 

,13 2\ 1/2 > 



Our final example comes from the theory of parabolic cylinder functons, Dp{z), which 
are solutions to the differential equation: 



(Pu , 1 z^. 



(23) 



with u = Dp{z) and for integer values ofp = n, they are related to the Hermite polynomials, 
Hn{z) by Dn{z) = 2~"/-^e~^ ^^Hn{-j=)- Finally we may write, for the special cases of n = 
-1,-2: 



D.(z) 



D.iz) 



1 — tanh(W —z) 
vr 



-e-'^''^ - z{l-ian\iUI-z)) 

TT 



(24) 

(25) 
(26) 



6 Conclusions 



In this Letter we have presented a function approximating erf(3;) to better than 4% V x, 
with exponential convergence as x — > oo. This solution is simply the kink soliton, </>(x) = 
tanh(2x/-^/7r) and can be optimised for accuracy if the error function at small values of the 
argument is required. 
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Further we have found a solution with maximum error of 0.15% by adding a generaUsed 



Maxwell distribution to the kink soliton, equations (12), ([14|). Future work should be aimed 
at finding truly optimal solutions. Finally a few applications were discussed, particularly to 
diffusion dynamics and to the generation of Gaussian random fields. 

The author would like to thank Prof. Domb, Claudio Scrucca and Lando Caiani for illumi- 
nating discussions and Stefano Bianchini for a very useful critical reading of the manuscript. 
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